Jmp8 at SemEval-2017 Task 2: A simple and general distributional approach to estimate word similarity
نویسندگان
چکیده
We have built a simple corpus-based system to estimate words similarity in multiple languages with a count-based approach. After training on Wikipedia corpora, our system was evaluated on the multilingual subtask of SemEval-2017 Task 2 and achieved a good level of performance, despite its great simplicity. Our results tend to demonstrate the power of the distributional approach in semantic similarity tasks, even without knowledge of the underlying language. We also show that dimensionality reduction has a considerable impact on the results.
منابع مشابه
ConceptNet at SemEval-2017 Task 2: Extending Word Embeddings with Multilingual Relational Knowledge
This paper describes Luminoso’s participation in SemEval 2017 Task 2, “Multilingual and Cross-lingual Semantic Word Similarity”, with a system based on ConceptNet. ConceptNet is an open, multilingual knowledge graph that focuses on general knowledge that relates the meanings of words and phrases. Our submission to SemEval was an update of previous work that builds high-quality, multilingual wor...
متن کاملCitius at SemEval-2017 Task 2: Cross-Lingual Similarity from Comparable Corpora and Dependency-Based Contexts
This article describes the distributional strategy submitted by the Citius team to the SemEval 2017 Task 2. Even though the team participated in two subtasks, namely monolingual and crosslingual word similarity, the article is mainly focused on the cross-lingual subtask. Our method uses comparable corpora and syntactic dependencies to extract count-based and transparent bilingual distributional...
متن کاملRUFINO at SemEval-2017 Task 2: Cross-lingual lexical similarity by extending PMI and word embeddings systems with a Swadesh's-like list
The RUFINO team proposed a nonsupervised, conceptually-simple and lowcost approach for addressing the Multilingual and Cross-lingual Semantic Word Similarity challenge at SemEval 2017. The proposed systems were cross-lingual extensions of popular monolingual lexical similarity approaches such as PMI and word2vec. The extensions were possible by means of a small parallel list of concepts similar...
متن کاملDistributional semantics beyond words: Supervised learning of analogy and paraphrase
There have been several efforts to extend distributional semantics beyond individual words, to measure the similarity of word pairs, phrases, and sentences (briefly, tuples; ordered sets of words, contiguous or noncontiguous). One way to extend beyond words is to compare two tuples using a function that combines pairwise similarities between the component words in the tuples. A strength of this...
متن کاملRobust semantic text similarity using LSA, machine learning, and linguistic resources
Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines Latent Semantic Analysis and machine learning augmented with data from se...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017